Okay so we have the three trees, rooted on rh1988 and ag342.
Things to note here are that the following species get pretty much 0 orthologs. I think a run without them is needed. This will increase the orthologs identified considerably.
And the results of the SCO between percentage speceis.
90% - greater 72 species, SCOs is 0.
85% - greater 68 species, SCOs is 2.
80% - greater 64 species, SCOs is 52.
75% - greater 60 species, SCOs is 164.
70% - greater 56 species, SCOs is 255.
65% - greater 52 species, SCOs is 308.
60% - greater 48 species, SCOs is 355.
55% - greater 44 species, SCOs is 419.
50% - greater 40 species, SCOs is 488.
So ortholog > 80% did not work very well. In the RAXML output it consistently could not find the relationship between the outgroups (As can be seen from the chaos at the bottom of the plot).
So this is with the following species removed due to having pretty much no orthogroups.
The orthogroups identified from this analysis, is much better (Scroll to the top to compare with the other analysis).
100% SCO is 1.
90% - greater 61 species, SCOs is 143.
85% - greater 58 species, SCOs is 227.
80% - greater 54 species, SCOs is 285.
75% - greater 51 species, SCOs is 314.
70% - greater 48 species, SCOs is 347.
65% - greater 44 species, SCOs is 416.
60% - greater 41 species, SCOs is 455.
55% - greater 37 species, SCOs is 560.
50% - greater 34 species, SCOs is 622.
I ran this for 90%, 85%, 80%, and 75% and rooted on
So the tree file has 68 species and the metadata has 94 species.
ggtree(g90rem_data,
ladderize = T,
size = 0.5) +
geom_tiplab(aes(label = label,
fontface = "italic"),
size = 3,
align = F,
family = "times") +
xlim(-1, 14) +
geom_rootedge(rootedge = 1) +
geom_nodelab(aes(x = branch, label = node),
size = 4)
So with removing the low orthogroup species this tree is looking way better (from my bioinformatic point, biologically I have no idea sorrrrrry lol).
Okay so I have run trimming, the prokaryotic and viral contamination
(with bit scores 150 and 200), and also removed contigs < 500bp. I
used the 150 and 200 input for BUSCO with the fungi_odb12
and ascomycota_odb12 databases.